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ABSTRACT 

Suppose there is a large collection of items, each with an as¬ 
sociated cost and an inherent utility that is revealed only 
once we commit to selecting it. Given a budget on the 
cumulative cost of the selected items, how can we pick a 
subset of maximal value? This task generalizes several im¬ 
portant problems such as multi-arm bandits, active search 
and the knapsack problem. We present an algorithm, GP- 
Select, which utilizes prior knowledge about similarity be¬ 
tween items, expressed as a kernel function. GP-Select 
uses Gaussian process prediction to balance exploration (es¬ 
timating the unknown value of items) and exploitation (se¬ 
lecting items of high value). We extend GP-Select to be 
able to discover sets that simultaneously have high utility 
and are diverse. Our preference for diversity can be spec¬ 
ified as an arbitrary monotone submodular function that 
quantifies the diminishing returns obtained when selecting 
similar items. Furthermore, we exploit the structure of the 
model updates to achieve an order of magnitude (up to 40X) 
speedup in our experiments without resorting to approxima¬ 
tions. We provide strong guarantees on the performance of 
GP-Select and apply it to three real-world case studies of 
industrial relevance: (1) Refreshing a repository of prices 
in a Global Distribution System for the travel industry, (2) 
Identifying diverse, binding-affine peptides in a vaccine de¬ 
sign task and (3) Maximizing clicks in a web-scale recom- 
mender system by recommending items to users. 

Categories and Subject Descriptors 

H.2.8 [Database Management]: Database Applications - 
Data Mining; G.3 [Probability and Statistics]: Experi¬ 
mental Design 

Keywords 

Design of experiments, Active search, Active learning, Ker¬ 
nel methods, Recommender systems 


1. INTRODUCTION 

Consider a large collection of items, each having an in¬ 
herent value and an associated cost. We seek to select a 
subset of maximal value, subject to a constraint on the cu¬ 
mulative cost of the selected items. If we know the items’ 
values and costs, this is just the classical knapsack problem 
- which is NP-hard, but can be near-optimally solved, e.g., 
using dynamic programming. But what if we do not know 
the values? Concretely, we consider the setting where we 
can choose an item, observe a noisy estimate of its value, 
then choose and evaluate a second item and so on, until 
our budget is exhausted. It is clear that in order to achieve 
non-trivial performance, we must be able to make predic¬ 
tions about the value of non-selected items given observa¬ 
tions made so far. Hence, we will assume that we are given 
some information about the similarity of items (e.g., via fea¬ 
tures), whereby similar items are expected to yield similar 
value. As a motivating application, consider experimental 
design, where we may need to explore a design space, and 
wish to identify a set of near-optimal designs, evaluating one 
design at a time. In the early stages of medical drug devel¬ 
opment, for example, candidate compounds are subject to 
various tests and a fixed number of them are selected to the 
next stage to perform animal/human testing. Even the ini¬ 
tial tests are expensive and the goal is to reduce the number 
of compounds on which these tests are conducted while still 
selecting a good set of compounds to promote to the next 
level. Another application is recommender systems, where 
for a given customer, we may seek to iteratively recommend 
items to read/watch, aiming to maximize the cumulative 
relevance of the entire set. Alternatively, we might want to 
pick users from our user base or a social network to promote 
a given item. In this setting, how should we select items to 
maximize total utility? 

We will call this general class of problems AVID - Adap¬ 
tive Valuable Item Discovery. To solve AVID, we need to ad¬ 
dress an exploration-exploitation dilemma, where we must 
select items that maximize utility (exploit) while simulta¬ 
neously estimating the utility function (explore). We ad¬ 
dress these challenges by using ideas from Gaussian Process 
optimization and multi-armed bandits to provide a princi¬ 
pled approach to AVID with strong theoretical guarantees. 
Specifically, we introduce a novel algorithm, GP-Select, 
for discovering high value items in a very general setting. 
GP-Select can be used whenever the similarity between 
items can be captured by a positive definite kernel function, 



and the utility function has low norm in the Reproducing 
Kernel Hilbert Space (RKHS) associated with the kernel. 
The algorithm models the utility function as a sample from 
a Gaussian process distribution, and uses its predictive un¬ 
certainty to navigate the exploration-exploitation tradeoff 
via an upper confidence based sampling approach that takes 
item costs into account. 

We also consider a natural extension of AVID, where the 
goal is to obtain a diverse set of items. This is an important 
requirement in many experimental design problems where, 
for example for reasons of robustness, we seek to identify a 
collection of diverse, yet high quality designs. In our drug 
design example, very similar compounds might cause similar 
side effects in the later stages of testing. Hence, we might 
require a certain diversity in the selected subset while still 
trying to maximize total value. In this work, we address 
the setting where our preference for diversity is quantified 
by a submodular function, modeling diminishing returns in¬ 
curred when picking many similar items. We prove that 
GP-Select provides an effective tradeoff of value and di¬ 
versity, establishing bounds on its regret against an omni¬ 
scient algorithm with access to the unknown objective. Our 
results substantially expand the class of problems that can 
be solved with upper confidence based sampling methods - 
desirable for their simplicity - in a principled manner. 

We evaluate GP-Select in three real-world case stud¬ 
ies. We first demonstrate how GP-Select can be used to 
maintain an accurate repository of ticket prices in a Global 
Distribution System that serves a large number of airlines 
and travel agencies. Here the challenge is to selectively re¬ 
compute ticket prices that likely have changed, under a bud¬ 
get on the number of computations allowed. Secondly, we 
demonstrate how GP-Select is able to determine a diverse 
set of candidate designs in a vaccine design application ex¬ 
hibiting high binding affinity to their target receptors. In 
these experiments, we also study the effect of inducing di¬ 
versity, and non-uniform selection cost. Finally, we present 
results on a web-scale recommender systems dataset pro¬ 
vided by Yahoo! where the task is to adaptively select user- 
item pairs that maximize interaction (clicks, likes, shares, 
etc.). 

Our experiments highlight the efficacy of GP-Select and 
its applicability to a variety of problems relevant to prac¬ 
titioners. In particular, with our suggested application of 
lazy variance updates, we are able to speed up the execu¬ 
tion by up to almost 40 times, making it usable on web-scale 
datasets. 

2. AVID: PRELIMINARIES 

We are given a set V = {1,..., n} of n objects. There is 
a utility function / : V —> R>o that assigns a non-negative 
value to every item in the set. Similarly, there is a func¬ 
tion c : V —^ R>o, assigning a positive cost c„ = c{v ) £ 
\cmin,Cmax\ to each item v. Given a subset S C V, its value 
F(S) = 12ves f( v ) sum °f the values of the selected 

items, and its cost C(S) the cumulative costs of the items. 
Given a budget B > 0, our goal is to select 

Sb = argmax F(S), (1) 

C(S)<B 

i.e., a subset of maximum value, with cost bounded by B. 

If we knew the utility function /, then Problem |lj is the 
classical knapsack problem. While NP-hard, for any e, an 


e-optimal solution can be found via dynamic programming. 

But what if we do not know /? In this case, we consider 
choosing a subset S in a sequential manner. We pick one 
item at a time, after which the value of the selected item 
is revealed (possibly perturbed by noise), and can be taken 
into account when selecting further items. We term this se¬ 
quential problem AVID - Adaptive Valuable Item Discovery. 

Equivalent to maximizing the cumulative value F(S), we 
aim to minimize the regret, i.e., the loss in cumulative value 
compared to an omniscient optimal algorithm that knows /. 
Formally, the regret of a subset Sb of cost B is defined as: 
Rb = F(Sb) — F(Sb). We seek an algorithm whose regret 
grows slowly (sublinearly) with the budget B, so that the 
average regret Rb/B goes to 0. 

Diversity. 

In several important applications, we not only seek items 
of high value, but also to optimize the diversity of the se¬ 
lected set. One way to achieve this goal is to add to our 
objective another term that prefers diverse sets. Concretely, 
we extend the scope of AVID by considering objective func¬ 
tions of the form: 

F(S) = {l-X)^2f(v) + XD(S). (2) 

ves 

Hereby, D(S ) is a known measure of the diversity of the se¬ 
lected subset S. Many such diversity-encouraging objectives 
have been considered in the literature (c.f., [itniiisiiii]). 
We will present an algorithm that is guaranteed to choose 
near-optimal sets whenever the function D satisfies submod¬ 
ularity. Submodularity is a natural notion of diminishing 
returns, capturing the idea that adding an item helps less 
if more similar items were already picked [I]. We discuss 
examples in Section [d] A £ [0,1] is a tradeoff parameter 
balancing the relative importance of value and diversity of 
the selected set. In the case where / is known, maximizing 
D requires maximizing a submodular function. This task is 
NP-hard, but can be solved near-optimally using a greedy 
algorithm [24]. In this paper, we address the novel setting 
where D is any known submodular function but / is un¬ 
known, and needs to be estimated. 

Regularity Assumptions. 

In the general case, where / can be any function, it is 
hopeless to compete against the optimal subset since, in the 
worst case, / could be adversarial and return a value of 0 
for each of the items selected by the algorithm, and positive 
utility only for those not selected. Hence, we make some 
natural assumptions on / such that the problem becomes 
tractable. In practice, it is reasonable to assume that / 
varies ‘smoothly’ over the candidate set V such that similar 
items in V have similar / values. In this work, we model 
this by assuming that the similarity k(v,v') of any pair of 
items v, v' £ V is given by a positive definite kernel function 
m k : V x V — > R, and that / has low “complexity” as mea¬ 
sured by the norm in the Reproducing Kernel Hilbert Space 
(RKHS) associated with kernel k. The RKHS 'Wk(V) is a 
complete subspace of L 2 (V) of ‘smooth’ functions with an 
inner product (•, ■)* s.t (/, k(v, .)) = f(v) for all / £ R K (V). 
By choosing appropriate kernel functions, we can flexibly 
handle items of different types (vectors, strings, graphs etc.). 
We use the notation K to refer to the n x n kernel (Gram) 
matrix obtained by evaluating k.(v, v') for all pairs of items. 


Explore-Exploit Tradeoff. 

Given the regularity assumptions about the unknown func¬ 
tion /, the task can be intuitively viewed as one of trading off 
exploration and exploitation. That is, we can either greedily 
utilize our current knowledge of / by picking the next item 
predicted to be of high value, or we can choose to pick an 
item that may not have the highest expected value but most 
reduces the uncertainty about / across the other items. This 
challenge is akin to the dilemma faced in multi-arm bandit 
problems. An important difference in our setting, motivated 
by practical considerations, is that we cannot select the same 
item multiple times. As a consequence, classical algorithms 
for multi-armed bandits (such as UCB1 of Auer et al. [Tj 
or GP-UCB of Srinivas et al. [30]) cannot be applied, since 
they require that repeated experimentation with the same 
“arm” is possible. Furthermore, classical bandit algorithms 
do not allow arms to have different costs. In fact, our set¬ 
ting is strictly more general than the bandit setting: We can 
allow repeated selection of a single item v by just creating 
multiple, identical copies i/ 2 \ ... with identical utility 
(i.e., /(t/ 1 )) = /( v = ...), which can be modeled using 
a suitably chosen kernel. 

Nevertheless, we build on ideas from modern bandit algo¬ 
rithms that exploit smoothness assumptions on the payoff 
function. In particular, Srinivas et al. iSO; show how the 
explore-exploit dilemma can be addressed in settings where, 
as in our case, the reward function has bounded RKHS norm 
for a given kernel function k. We interpret the unknown 
value function / as a sample from a Gaussian Process (GP) 
prior [5(3, with prior mean 0 and covariance function k. Con¬ 
sequently, we model the function as a collection of normally 
distributed random variables, one for each item. They are 
jointly distributed, such that their covariances are given by 
the kernel: 

Cov(f(v),f{v')) =k(v,v'). 

This joint distribution then allows us to make predictions 
about unobserved items via Bayesian inference in the GP 
model. Suppose we have already observed feedback y t = 
{yi, ■■■ ,yt} for t items S t = {ui,... ,v t }, i.e., y t = f(vi) + u, 
where a is independent, zero-mean Gaussian noise with vari¬ 
ance <f 2 . Then, for each remaining item v, its predictive dis¬ 
tribution for f{v) is Gaussian, with mean and variance (us¬ 
ing noise variance a, according to our assumptions) given by: 

Pt{v) = ki(u) T (Kf + <f 2 I) _1 yt, (3) 

0 t(v) = k(v,v) - kt(v) T (K t + <f 2 I)kt(u), (4) 

where kt(u) = [k(vi, v), ... , n(vt, v)] T , K f is the positive 
semi-definite kernel matrix such that for i,j < t, K t ,i,j = 
[n(vi, Vj)] and I is the txt identity matrix. In Section [3j we 
show how we can use these predictive distributions to nav¬ 
igate the exploration-exploitation tradeoff. Note that while 
we propose a Bayesian algorithm (using a GP prior, and 
Gaussian likelihood), we prove agnostic results about arbi¬ 
trary functions / with bounded norm, and arbitrary noise 
bounded by a. 


Algorithm 1 GP-Select 

Input: Ground Set V, kernel k and budget B 
Initialize selection set S 

for t = 1,2,..., B do 
Model Update: 

\pt- i(-),cr?-i(-)] t- GP-Inference(K, (S, 2/{i ;t -i})) 

Item Selection: 

Set vt t- argmax p t -i(v) + ff /2 a t ~i(v) 
ve~v 

S <- S U {v t } 

Receive feedback y t = f(vt) + e t 

end for 


3. THE UNIFORM COST CASE 

We first provide the solution for the simple case of uniform 
costs. In this setting, if the values are known, a greedy al¬ 
gorithm adding items of maximal value solves Problem 0 
optimally. Our key idea in the unknown value case is to 
mimic this greedy algorithm. Instead of greedily adding the 
item v with highest predicted gain pt-i(v), we trade explo¬ 
ration and exploitation by greedily optimizing an optimistic 
estimate of the item’s value. Concretely, our algorithm GP- 
Select for the uniform cost case performs both a model up¬ 
date and selects the next item upon receiving feedback for 
the current selected item. The model update is performed 
according to Equations 0 and 0. 

For our selection rule, we borrow a key concept from 
multi-armed bandits: upper confidence bound sampling. Con¬ 
cretely, we choose 

v t = argmax p t -i(v) + ff' 2 a t -i{v), (5) 

The tradeoff between exploration and exploitation is implic¬ 
itly handled by the time varying parameter fit (defined in 
Theorem [lj that alters the weighting of the posterior mean 
(favoring exploitation by selecting items with high expected 
value) and standard deviation (favoring exploration by se¬ 
lecting items that we are uncertain about), fit is chosen 
such that pt-i{v) + fi^ 2 at-i(v) is a high-probability upper 
bound on f(v), explained further below. 

Regret bounds. 

We now present bounds on the regret Rb incurred by 
GP-Select. Crucially, they do not depend on the size of 
the ground set |V|, but only on a quantity Ck that depends 
on the task specific kernel capturing the regularity of the 
utility function over the set of items. Specifically, for a kernel 
matrix K, the quantity Ck is given by: 

C K = 1 -\og\l + a- 2 K\. (6) 

We now present the main result about GP-Select in the 
uniform cost case. 

Theorem 1. Let 8 E (0,1). Suppose that the function 
f lies in the the RKHS 7-G(V) corresponding to the kernel 
k(v,v') with an upper bound on the norm of f w.r.t. k given 
by R (i.e., ||/|| 2 < R). Further suppose that the noise has 
zero mean conditioned on the history and is bounded by <r 
almost surely. Let fit = 2 R + 300Cirlog 3 (f/<5). Running 
GP-Select with a GP prior using mean zero, covariance 
k(v,v') and noise model N(0,o 2 ), we obtain a regret bound 





( 9 ) 


of 0 *(V~B(R\/Ck + Ck)) w.h.p. Specifically, 


a set S is given by: 


Pr{R B < VCiB/3 b C k MB > 1} > 1 - 6 
where Cx = tog(1 |-- 2) ■ 

The proof of this theorem is presented in the Appendix. 

Interpretation of the Theorem. 

Theorem [l] guarantees that under sufficiently regular / 
and suitable choice of fit, the average regret compared to 
the best subset approaches 0 as B increases. Our regret 
bound depends only on the constant Ck rather than the 
actual size of the set V. It is instructive to think of how 
the value Ck grows as the size of the ground set, n = |V| 
increases. As long as the kernel function is bounded, it can 
be seen that Ck is 0(n). For many commonly used kernel 
functions, however, this quantity grows strictly sublinearly 
in the number n of elements. For instance, for the popular 
RBF kernel in d dimensions (that is, V C R d ), it holds that 
Ck = Ck(u) = 0((logn) d+1 ). Refer Srinivas et al. [3Q] for 
this and other analytical bounds for other kernels. In any 
case, a problem specific Ck can always be computed effi¬ 
ciently using the formula in Equation © . Further note that 
as long as we use a universal kernel n (like the commonly 
used Gaussian kernel), for finite item sets (as we consider 
here) the RKHS norm ||/|| K is always bounded. Hence, The¬ 
orem |T] guarantees that our regret will always be bounded 
for such kernels, provided we choose a large enough value 
for R. 

An important point to be made here is that the value of 
fit as prescribed by Theorem [l] is chosen very conservatively 
for sake of the theoretical analysis. For most practical appli¬ 
cations, fit can be scaled down to achieve faster convergence 
and lower regret. 

4. SELECTING DIVERSE SUBSETS 

In some cases, we not only seek high cumulative value of 
the solution set, but also prefer diversity. This can be the 
case because we desire robustness, fairness etc. Formally, 
we can encode this diversity requirement into the objective 
function as done in ©• Hereby / is an unknown function 
that operates on individual elements, while D is a known set 
function that captures the diversity of a subset. It is natural 
to model diversity as a submodular function. Formally, a set 
function D : 2 V —> R is submodular if for every ACBCV 
and «eV\B,it holds that 

A n{v | A) > A d{v \ B), (7) 

where Ajj(» | A) = D(Au{v}) —D(A) is called the marginal 
gain of adding v to set A. D is called monotone, if, whenever 
A C B it holds that D{A) < D(B). 

The rationale behind using submodular functions to model 
diversity is based on the intuition that adding a new ele¬ 
ment provides less benefit (marginal gain) as the set of sim¬ 
ilar items already selected increases. Many functions can be 
chosen to formalize this intuition. In our setting, a natural 
monotone submodular objective that captures the similarity 
as expressed via our kernel, is 

D{S) = ^log|(I + a n “ 2 K s ,s)| , (8) 

where a n > 0. We use this objective in our experiments. 
For this choice, the marginal gain of adding an element v to 


A d{v | S ) = ^log(l + ffn 2 o- 2 v \ s ), 

where a^ s is the predictive variance of f{v) in a GP model, 
where the values of elements in S have already been observed 
up to Gaussian noise with variance cj\. Conveniently, while 
executing GP- Select, if a = a n , we already compute a^ s 
in order to evaluate the decision rule |5|. Hence, at almost 
no additional cost we can compute the marginal gain in di¬ 
versity for any candidate item v. 

In order to select items that provide value and diversity, 
it is natural to modify the selection rule of GP- Select in 
the following way: 


v t = argmax (1 - A) Lt*_i(u) + fi] /2 a t -i{v) 

»6V\{vi:(_i} <- J 

+A A D (v | {«i,... ,v t -i}). (10) 


This decision rule greedily selects item v that maximizes a 
high-probability upper bound on the marginal gain Ap(v 
{ui,..., Wt_i}) of the unknown combined objective F. 


Regret bound. 

The regret bound in Section [3] depended on the fact that 
we were optimizing against / that assigned values to indi¬ 
vidual elements, v £ V. The same bounds need not hold in 
the more challenging setting when trading value against di¬ 
versity. In fact, even if both / and D are completely known 
for all v € V, it turns out that optimizing F in (21 is NP- 
hard for many monotone submodular functions D [g . While 
finding the optimal set is hard, Nemhauser et al. [24] states 
that - for a known monotone submodular function - a simple 
greedy algorithm provides a near-optimal solution. 

Formally, suppose S' 0 = $ and S' +1 , the greedy exten¬ 
sion to S'i. That is, S' i+1 = S') U{argmax„ ev\s< a eO I Si)}. 
Thus, S' B is the set we obtain when selecting B items, always 
greedily maximizing the marginal gain over the items picked 
so far. Then it holds that F(S B ) > (1—1/e) max|s|< fl F(S) = 
(1 — 1 /e)F(S%). Moreover, without further assumptions 
about D(S) and /, no efficient algorithm will produce bet¬ 
ter solutions in general. Since we are interested in compu¬ 
tationally efficient algorithms, we measure the regret of a 
solution Sb by comparing F(S B ) to F(S B ), which is the 
bound satisfied by the greedy solution. Formally, R b = 

(1-1 /e)F(S* B ) - F(S B ). 


Theorem 2. Under the same assumptions and conditions 
of Theorem [ 7 ] 

Pr{R B < C\Bfi B C K MB > 1} > 1 — <5, 

where R B = (1—1 /e)F(S’fi)—F(S B ) is the regret with respect 
to the value guaranteed when optimizing greedily given full 
knowledge of f and D. 


Please refer to the Appendix for the proof of this theo¬ 
rem. It rests on interpreting GP-Select as implementing 
an approximate version of the greedy algorithm maximizing 
A p(v | St). In fact, Theorem[2]can be generalized to a large 
number of settings where the greedy algorithm is known to 
provide near-optimal solutions for constrained submodular 
maximization. 

As an illustration of the application of this modified GP- 
Select to diverse subset selection, refer to Figure |l(a)| 









Diversity Weight = 0.000000 



(a) Balancing utility and diversity 




(b) Average Regret (Diversity) 


Diversity Score 

(c) Effect of Inducing Diversity 


Figure 1: |(a)| Illustration of sets selected for trading / against D when varying parameter A. (b) Per¬ 

formance of GP-Select in selecting diverse subsets. For different values of A, the average regret against the 
greedy approximate algorithm decreases, (c) Improvements in diversity can be obtained at little loss of 
utility. 


When A = 0, GP-Select reverts back to Algorithm |T| and 
hence, picks locations only based on its expected / value. 
This is clear from the thick bands of points sampled near 
the maximum. At A = 0.6, GP-Select balances between 
expected / values of the points and the marginal gain in 
diversity of the points picked Ad(v \ S). At A close to 1, 
GP-Select picks mostly by marginal gain which will be 
approximately uniform if the kernel used is isotropic (e.g. 
Gaussian kernel). 


5. NON-UNIFORM COSTS 


In the general case where each element v £ V has different 
costs of selection c„, the budget B is the total cost of all 
items in the selected subset. We modify the selection rule 
in Algorithm [T] to take the estimated cost-benefit ratio into 
account. Most of the other steps remain the same except 
ensuring that we respect the budget, and the formula for 
computing /3t- The new selection rule for the setting without 
diversity is: 


Vt 


argmax 

»6V\S,c,.<B-C(S) 


fit-i(y) + Pl / 2 a t -i(v) 

c v 


( 11 ) 


Hence, instead of maximizing an optimistic estimate of the 
item’s value, we greedily maximize an optimistic estimate of 
the benefit-cost ratio. Note that this greedy rule encourages 
some natural opportunistic exploration: Initially, it will se¬ 
lect items that we are very uncertain about (large at- i), but 
that also have little cost. Later on, as the utility is more 
accurately estimated, it will also invest in more expensive 
items, as long as their expected value (pt-i) is high. 

The idea above can be generalized to encourage diversity 
as well. The selection rule in (JTo]) can be modified to maxi¬ 
mize the ratio 

(1 - A) L s _i(n) + p 1 J 2 as-i(v) \ + \A D (y \ S ) 

- L -7- J -' (12) 

Hence, in this most general setting, we greedily optimize a 
high-probability upper bound on the cost-benefit ratio of the 
marginal gain for the joint objective. 

Upon these modifications, we can obtain the result pre¬ 
sented in Theorem [3] The result holds for running GP- 
Select for selecting diverse items with items in the ground 


set having non-uniform costs of selection. Again, we present 
the proof of the Theorem in the Appendix. 

Theorem 3. Under the same assumptions and conditions 
of Theorem [7] running GP-Select with non-uniform costs 
for the items, we have that 


BPbCkJ VS > 1} > 1—<5, 

where Rb = (1— l/e)F(S%)—F(S b) is the regret with respect 
to the value guaranteed when optimizing greedily given full 
knowledge of f and D. 

6. EXPERIMENTAL EVALUATION 

6.1 Case Study I: Airline Price Update Pre¬ 
diction Task 

Amadeus IT group saQ is a Global Distribution System 
(GDS) for airline prices. One of the services provided by 
Amadeus is finding the cheapest return fare between cities 
X and Y on requested dates of travel. This is currently done 
by frequently querying all the airlines for their respective 
cheapest fares for each pair of cities and then aggregating 
the results to maintain this information. This consumes a 
lot of bandwidth and time. Also, computing the fare for 
a given request is a computationally expensive task as the 
cheapest fare might include multiple hops possibly operated 
by different airlines. Hence, a table of precomputed current 
best prices is maintained in order to quickly respond to fare 
requests by customers. Since the database is typically very 
large and computing fares is relatively expensive in terms 
of computation and network bandwidth, it is challenging to 
frequently recompute all fares (i.e., update the entire table). 
Since similar prices for similar fare requests (table entries) 
often change at the same time, the goal is to selectively 
recompute only entries that changed. This task can be nat¬ 
urally captured in our setting, where items correspond to 
table entries selected for recomputation, and the utility of 
an item is 1, if the entry changed and 0, otherwise. 

The data provided by Amadeus for this task was collected 
in December 2011. It consists of cheapest fares computed for 

1 http://www.amadeus.com 























































(a) Average Regret (b) Non-uniform costs (c) Flight ticket price change prediction 


Figure 2: |(a)| While average regret decreases for all non-naive al gorit hms, GP-Select drops much earlier 

and continues to outperform the baselines in the vaccine design task. m Comparison of GP-Select with the 
baselines for the vaccine design task under non-uniform item costs. |(c)[ GP-Select outperforms benchmarks 
on the fare change prediction task. 


50,000 routes (origin-destination pairs) and for all departure 
dates up to 90 days into the future. For each departure date, 
the return date could be up to 15 days after the departure. 
The budget for selection corresponds to the total number of 
price refresh computations allowed. Our performance met¬ 
ric is the ratio between the total number of correct prices 
(i.e., correct entries in the table) and the total number of 
prices in the repository. Since we have the data with all the 
correct prices, we are able to compute the number of prices 
an algorithm would have missed to update (regret). 

In our experiments, we pool all the data for a given route 
together, and sequentially process the data set, one “current 
date” at a time. The task is to discover items (table entries) 
that have changed between the current date and the next 
date. We thus instantiate one instance of the active discov¬ 
ery problem per route per day. For each instance, we select 
from 90 ■ 15 = 1350 prices to recompute. Typically only 
22% of the data changed between days, hence even with a 
budget of 0, around 78% of the prices are correct. In order 
to capture similarity between items (table entries), we use 
the following features: date, origin, destination, days until 
departure, duration of stay, current price. We use an RBF 
kernel on these features and tune the bandwidth parameter 
using data from four routes (origin-destination pairs). We 
compare GP-Select against the following baselines: 

1. Random: Naive baseline that picks points to query 
uniformly at random until the budget is exhausted 

2. Epsilon-First: A Support Vector Machine (SVM) 
classifier is trained on a randomly sampling part of the 
data. Concretely, we report the values for two differ¬ 
ent settings that perform best among other options(5% 
and 15%) of the data. The SVM is then used to predict 
changes, and the predicted points are updated. When 
higher budgets are allowed, we use a weighted version 
of the SVM that penalizes false negatives stronger than 
false positives. 

Figure [2| (c) | presents the results of our experiments. In gen¬ 
eral, GP-Select performs better than the baselines. Note 
that all three non-naive algorithms reach similar maximum 
performance as the budget is increased close to 100% of the 
total number of items. 


6.2 Case Study II: Vaccine Design Task 

The second task we consider is an experimental design 
problem in drug design. The goal is to discover peptide se¬ 
quences that bind well to major histocompatibility complex 
molecules (MHC). MHC molecules act as a mediator for in¬ 
teraction of leukocytes (white blood cells) with other leuko¬ 
cytes or body cells and play an important role in the immune 
system. In our experiments, the goal is to choose peptide se¬ 
quences for vaccine design that maximizes the binding affin¬ 
ity to these Type I MHC molecules [25]. It is known from 
past experiments that similar sequences have similar binding 
affinity responses PUGH [35]. Instead of selecting only one 
optimal sequence, it is an important requirement to select 
multiple sequences as candidates and the actual determi¬ 
nation of the best sequence is delayed until more thorough 
tests are completed further down the drug testing pipeline. 
Hence, while the task for this dataset can also be viewed as 
a classification task (binders vs non-binders), we are inter¬ 
ested in the actual value of the binding affinity and want to 
pick a set of peptide sequences that have maximal affinity 
values. 

The dataset [25] consists of peptide sequences of length 
l = 9 for the A_0201 task [38] which consists of 3089 peptide 
sequences along with their binding affinities (IC50) as well 
features describing the peptide sequences. We normalize the 
binding affinities and construct a linear kernel on the pep¬ 
tide features. The task is then to select a subset of up to 500 
sequences with maximal affinities. Since this is now inher¬ 
ently a regression task, we used GP regression to estimate 
the predictive mean of the underlying function. The follow¬ 
ing baseline algorithms were considered for comparison: 

1. Random: Naive algorithm that picks sets of size 500 
uniformly at random. We repeated this 30 times and 
report average total affinity values. 

2. Pure Explore: This algorithm picks the most uncer¬ 
tain sequence among the remaining sequences. The 
GP is refitted every time an observation is made. 

3. Pure Exploit: This algorithm always picks the next 
sequence as the one with the highest expected affinity 
as computed by GP-regression and the resulting values 
are used to retrain the GP. This is equivalent to the 
one-step lookahead policy of [9]. It is not feasible to 
implement two or three step lookahead with this large 

















dataset. 

4. Epsilon First: This algorithm randomly explores for 
a few iterations and then once the GP is trained with 
the observed responses, behaves exactly like Pure Ex¬ 
ploit. Among all the options we tried, we report re¬ 
sults for training on the first 20% of the budget (100 
sequences in this case) since this performed best. A 
major drawback of this algorithm is that it needs to 
know the budget a priori. We repeated this algorithm 
30 times on the data and report the average. 


The results of these experiments are presented in Figure [2] 


(a) which displays the average regret Rb/B. GP-Select 


clearly outperforms the baselines in the regret measure. The 
average regret drops much faster for GP-Select and con¬ 
tinues to remain lower than all the baseline across all the 
iterations. 


Choosing Valuable and Diverse Subsets. 

Using the same vaccine design dataset, we implement the 
modified version of GP-Select presented in Section [4] to 
select a diverse set of peptide sequences. This requirement 
of diversity is quite natural for our drug testing application: 
Very similar sequences, while having similar affinity values, 
might also suffer from similar shortcomings in later stages of 
drug testing. We run GP-Select with different values of the 
tradeoff parameter A, and report the results. Figure [I|(b)| is 
the average regret Rb/B of GP-Select for different values 
of A . The plot demonstrates that when selecting diverse 
subsets GP-Select has a similar regret performance as in 
the initial case when it was selecting only for value. Also, 
the average regret compared to the greedy optimal solution 
slightly increases with increase in the value of A. Figure m 
shows the inherent tradeoff between value and diversity. We 
use values of A = {0,0.5,0.75,0.875,0.9375,0.96875} and 
plot the performance. We use the diversity function defined 
in Equation <§. It should be noted that this function is 
in log scale. From the plot it is clear that for a significant 
increase in the diversity score, we lose very little functional 
value, which suggests that robustness of the solution set can 
be achieved at very little cost. The greedy curve on this 
same plot shows the tradeoff that the greedy algorithm ob¬ 
tains knowing the utility function. This result serves as a 
reference, as no efficient algorithm can match it without ac¬ 
tually knowing the response function over all the sequences. 
Note that as we put all weight on diversity, as expected, 
GP-Select’s performance converges to that of the greedy 
algorithm. 


Non-Uniform Costs. 

The vaccine design task also provides a natural motiva¬ 
tion for the non-uniform costs setting. Typically, the cost of 
testing depends on the actual sequence being tested. Also, 
field tests differ markedly in their cost of execution. For 
our dataset, we did not have the costs associated with test¬ 
ing. However, we simulated non-uniform costs for selection 
of the peptide sequences by sampling c„ uniformly from the 
range [c min , c max ]. For different values of [c mi „, c max \, we 
found that GP-Select performed better than all the base¬ 
lines considered. Note that we have used the greedy solution 
as the hindsight optimal one and this is known to be at most 
a factor of 2 away from the true optimal solution. While 
the performance was similar for different values of c m in and 


Cmax, we report results of one of the settings in Figure p|(b)| 
where Cmin ~ 2 and Cmax ~ 7. 

6.3 Case Study III: News Recommendation 

The Yahoo! Webscope dataset R6A[^]consists of more than 
45 million user visits to the Yahoo! Today module collected 
over 10 days in May 2009. The log describes the interaction 
(view/click) of each user with one randomly chosen article 
out of 271 articles. It was originally used as an unbiased 
evaluation benchmark for bandit algorithms [21i 1341 . Each 
user u and each article a is described by a 6 dimensional 
feature vector. That is, u € R 6 and a £ R 6 . Thus, each 
possible interaction can be represented by a 36 dimensional 
feature vector (obtained from the vectorized outer product 
of user and item features) with a click (1) or no-click (0) as 
the outcome. Chu et al. [5j present a detailed description of 
the dataset, features and the collection methodology. 

In our experiments, we consider an application where we 
seek to select a subset of articles to present to a subset of 
users. Hence, we sequentially pick user-item pairs aiming 
to maximize the number of clicks under a constraint on the 
number of interactions. Here, a very natural constraint is 
that we do not want to repeatedly show the same item to 
the same user. We randomly subsample 4 million user visits 
from the Webscope log and treat each interaction as an item 
with a latent reward that can be observed only when that 
item is picked. As baseline, we also compute the best fixed 
predictor of the reward given the entire log a priori. This 
serves as an unrealistic benchmark to compare our algorithm 
and other baselines against. We also compare against the 
other baselines used in the vaccine design task. 

For GP-Select, we use the linear kernel to model simi¬ 
larities between the interactions. This is just the Kronecker 
product (0) of the individual linear kernels on the users 
and items. We simulate the selection of 100,000 interac¬ 
tions. The total number of clicks in the dataset (of size 4 
million) is 143,664, resulting in an average clickthrough rate 
(CTR) of about 0.0359. 

Results. 

Of the 100,000 selected items, the hindsight-ideal algo¬ 
rithm discovers 8836 items that were eventually clicked on. 
In comparison, GP-Select discovers 8768 items beating the 
other baselines by at least 10%. This corresponds to a CTR 
of 0.0877 which is considerably higher than the average CTR 
in our dataset. The next best approach is the Epsilon First 
approach that randomly selects items for 20% of its bud¬ 
get and then trains a classifier to predict the reward for 
the remaining items. Detailed results are presented in Fig¬ 
ure [3[(a)] 

Scaling to web scale datasets 

The major bottleneck in using Gaussian Processes is the 
computation of the posterior mean and variance. There 
are several works that attempt to speed up GP-based al¬ 
gorithms [201 136} , which we can immediately benefit from. 
Also, our task can be inherently parallelized by distribut¬ 
ing the computation across multiple cores/machines and a 
central processor collects the top UCB scores and picks the 
one with the best from all the machines. The reward for the 
chosen item along with the item itself is communicated to 

'http://webscope.sandbox.yahoo.com/ 















Naive variance update 

Lazy variance update 

Avg. time for 
one update 

5400ms (for 4m updates) 

4.6ms (1 update) 

Number of 
updates 

400 Billion (Predicted) 

~ 6 Billion (Actual) 

Execution Time 

150 hours (Predicted) 

3.9 hours (Actual) 


(b) Performance Improvements 


Figure 3: Experiments on the news recommendation dataset, (a) GP-Select outperforms all t he baselines 
by at least 10% while almost discovering as many clicks (8768) as the hindsight ideal (8863). (b) Our failsafe 
approach for lazy variance updates achieves almost 40X speedup. 


the worker nodes which use the information to update the 
posterior mean and variances. 

To obtain further improvements, we adapt the idea of lazy 
variance updates, originally proposed by Desautels et al. |7 
for the bandit setting, and extend it with a novel failsafe 
variant. We note that the majority of the computation time 
is spent on computing the posterior variance update, which 
requires solving a linear system for each item. The key in¬ 
sight is that, for a given item v, of (d) is monotonically de¬ 
creasing in t. We exploit this to recompute aft) only for 
those items that could influence the selection in round t, 
via use of a priority queue. That is, in every round, we 
lazily pick the next item vt based on the variance bound 
from the previous round and update the UCB score for that 
item. If Vt remains the selected item with the new score, 
we do not need to recompute the variances for the other 
items. We repeat this process until we find an item whose 
position at the head of the priority queue does not change 
after recomputation of the variance. However, note that if 
we have to recompute for many items in one round, it might 
be faster to update the variance for items due to the com¬ 
putational overhead associated with using a priority queue 
(and the benefits of parallelism). Thus, we include a failsafe 
condition whereby on crossing a machine and task depen¬ 
dent threshold on the number of lazy updates in one round, 
we switch to the full update. Thus, we eliminate the possi¬ 
bility that a large number of non-contiguous updates might 
be much slower than one full contiguous update for all the 
items. Using this technique, we achieve a reduction factor of 
almost 70 in the number of updates and an overall speedup 
of almost 40 in terms of computational time. The results 
are presented in Figure [3||(b)| 


7. RELATED WORK 

Frequent itemset mining El is an important area 
of research in data mining. It attempts to produce sub¬ 
sets of items that occur together often in transactions on a 
database. However, it is very different in nature from AVID, 
the problem we address in this paper, since we do not opti¬ 
mize frequency, but (unknown) value. 

Active learning algorithms select limited training data 
in order to train a classifier or regressor. Uncertainty sam¬ 
pling, expected model improvement, expected error reduc¬ 
tion, variance reduction are some of the popular metrics in 
use in this field [28]. In (budgeted) active learning, the ob¬ 
jective is to learn a function (regression or classification) as 
well as possible given a limited number of queries. In con¬ 
trast, we do not seek to learn the function accurately, but 
only to choose items that maximize the cumulative value 
(e.g., the number of positive examples) of a function. 

Active Search aims to discover as many members of a 
given class as possible [9]. Here, the authors propose sin¬ 
gle and (computationally expensive) multi-step look ahead 
policies. It is not clear however how their approach can 
be applied to regression settings, and how to select diverse 
sets of items. Furthermore, they do not provide any per¬ 
formance guarantees. Wang et al. [35] extended this ap¬ 
proach to present a myopic greedy algorithm that scales to 
thousands of items. Warmuth et al. m proposed a simi¬ 
lar approach based on batch-mode active learning for drug 
discovery. The algorithms proposed in these works are sim¬ 
ilar to our exploit-only baseline and further, work only for 
classification tasks. 

Multi-arm bandit (MAB) problems are sequential 
decision tasks, where one repeatedly selects among a set of 
items (“arms”), and obtains noisy estimates of their values 
[191 . They abstract the explore - exploit dilemma. In con¬ 
trast to our setting, in MAB, arms can be selected repeat¬ 
edly: Choices made do not restrict arms available in the fu- 

















ture. In fact, our setting is a strict generalization of the ban¬ 
dit problem. Early approaches like Auer et al. pQ addressed 
the setting where utilities are considered independent across 
arms, and hence cannot generalize observations across arms. 
More recent approaches OEunj address this shortcoming 
by exploiting assumptions on the regularity of the utility 
function. In particular, Srinivas et al. [30] develop a bandit 
algorithm, GP-UCB, with regret bounds whenever regularity is 
captured via a kernel function, which we build on and extend 
in our work. In other extensions (e.g. H3U32]), the authors 
consider picking multiple arms per round. However, in these 
settings, subset selection is a repeated task with the same 
set of arms available for selection each time. Also, Kleinberg 
et al. m consider the case where only a subset of arms are 
available in each round. However, their results do not ap¬ 
ply to our case where arms becomes unavailable upon being 
selected just once. 

Stochastic Knapsacks. Budget limited explore-exploit 
problems have been studied in context of the stochastic 
knapsack problem. Hereby, the learning process is con¬ 
strained by available resources. Gupta et al. m provide 
strong regret bounds for the scalar budget case. Tran-Thanh 
et al. 331 consider prior-free learning for the same prob¬ 
lem. Badanidiyuru et al. [2] study the problem under multi¬ 
dimensional budget constraints. However, all approaches 
consider arms as independent (i.e., uncorrelated), and hence 
do not generalize observations across similar arms as we do. 

Submodularity is a natural notion of diminishing re¬ 
turns of subsequent choices that arises in many applica¬ 
tions in machine learning and other domains. A celebrated 
result about the performance of the greedy algorithm by 
Nemhauser et al. |24| allows fast yet near-optimal approxi¬ 
mation algorithms to a number of NP-hard problems. How¬ 
ever, these approaches assume that the underlying utility 
function is known, whereas here we attempt to learn it. 
Streeter and Golovin m use submodular function maximi¬ 
sation to solve online resource allocation tasks. 

Diversity inducing rankings and selection have been stud¬ 
ied in a variety of settings (e.g. [2§])- In particular, sub- 
modular objective functions are proposed and used by Kim 
et al. m, Lin and Bilmes [22], Streeter et al. (32], Yue and 
Guestrin [3S| to model and optimize for diverse solutions. 
These approaches provide insights on how to quantify pref¬ 
erence for diversity via submodularity. However, their al¬ 
gorithms do not apply to our setting, as they consider the 
setting where sets are repeatedly selected, whereas we build 
up a single set one element at a time. 

Lazy variance updates in explore-exploit settings were 
proposed by Desautels et al. [7] who generalized the lazy 
greedy algorithm for submodular maximization [23]. We 
have adapted this approach to propose a failsafe lazy vari¬ 
ance update technique that gives dramatic speedups in our 
experiments. 

8. CONCLUSIONS 

We introduced AVID - Adaptive Valuable Item Discov¬ 
ery, a novel problem setting capturing many important real 
world problems. We presented GP-Select, a theoretically 
well founded approach to select high-value subsets from a 
large pool of items. We further showed how it can be ex¬ 
tended to select diverse subsets, by adding a submodular 
diversity term to the objective function, and how to han¬ 
dle non-uniform cost. We prove regret bounds for all these 


settings. We further demonstrated the effectiveness on three 
real world case studies of industrial relevance. To enable the 
application of GP-Select to web-scale problems, we pro¬ 
posed a failsafe lazy evaluation technique that dramatically 
speeds up execution of GP-Select. Empirically, we find 
that GP-Select allows us to obtain a fine-grained tradeoff 
between value and diversity of the selected items. We be¬ 
lieve our results present an important step towards address¬ 
ing complex, real-world exploration-exploitation tradeoffs. 
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Appendix 

Proof of Theorem [T| 

We now prove Theorem]!] Our proof builds on the analysis of m, who address the multi-armed bandit setting with RKHS 
payoff functions. A difference in our analysis is the usage of the constant Ck instead of 7 1 - Acoording to the definition in 
j'30j . 7 1 measures the maximum mutual information I(fs,ys) = \ log |I + cr _ 2 K,s,,s| that can be extracted about / using t 
samples y s from V. 


7 B= S C^|<B I(/SJS) (13) 

But note that the way we have defined Ck, it is easy to see that it is an upper bound on 7 *. This is because, we can always 
define a kernel matrix within only the most informative subset of size t (say K') and its corresponding Ck' and this would 
be exactly be 7 1 - And, Ck > Ck'- This is because given the constraint of our problem setup, after t rounds, the algorithm 
necessarily has to have picked t distinct items to evaluate. 

Apart from this, there are two important, interrelated changes from the original setting described in [30i : 

1. We must respect the additional constraint that we cannot pick the same item twice. 

2. The hindsight optimal choice is not a single action but instead a subset of elements in V. 

With these two changes, in order to prove the statement of the theorem, we need to prove a different statement of Lemma 
5.2 from [30]. The remaining part of the proof (Theorem 6, Lemmas 5.3 and 5.4) remain the same as in [30]. For the sake 
of the proof of Theorem |T] we replace Lemma 5.2 from m with the following Lemma |T] and prove a new statement that 
captures the main differences between the settings. Theorem [4] Lemmas [2] and [3] are stated without proof and correspond 
exactly to Theorems 6, 5.3 and 5.4 of [30] 

The first theorem establishes high probability bounds on the utility function /. These carry over without modification. 
Theorem 4 (Srinivas et al., 2012). Let 5 £ (0,1). Assume noise variables et are uniformly bounded by d. Define: 

Pt = 2||/||k + 300C k log 3 (f/<5) 

Then, Mv £ V, MB > 1 

\f(v) - ih-i(v)\ </3j /2 crt i(u) 

holds with probability >1 — 5. 

The next lemma bounds the instantaneous regret in terms of the widths of the confidence interval at the selected item. 

Lemma 1. Fix t £ [1,B]. If Mv £ V : \f(v) — < (3^ 2 at-i{v), then the instantaneous regret rt is bounded by 

2/3l /2 a t -i{v t ). 

Proof: At any iteration, t < b, by the definitions of vt and 11 *, one of the following statements is true. 

1. Our algorithm has already picked u* in an earlier iteration. In this case, 3 1' s.t /(vj) > /(ft). This is because the ideal 
ordering has a non-increasing / value for its elements. Hence, 

lH-i{vt) + /3l /2 a t -i(vt) > Ait-i(u t *) + Pt /2 a t -i{vp) 

> 

> /«) 

2. Our algorithm has not yet picked vf in an earlier iteration. In this case, 

+& 1 / 2 cr t _i(ut) > p,t-i(vt) + PI /2 ch-i{vI) 

> /(«?) 


Thus, in both cases, the statement of the lemma holds. 

Lemma 2 (Srinivas et al., 2012). The information gain for the objects selected can be expressed in terms of the pre¬ 
dictive variances. If fs = ( f{vt )) £ R s .' 

1 B 

I(yr;fB) = -^log(l+a 2 cr t 2 _ 1 (ui)) 

^ t= 1 

Lemma 3 (Srinivas et al., 2012). Pick 5 £ (0,1) and let (5t be as defined in Theorem [ 4 ] Then, the following holds with 
probability >1 — 5 


J2 r t < faCiI(y b );f b ) < CifaCK Mb > 1 


Now, using Cauchy-Schwartz inequality, R% < B 1 r t and this proves the statement of Theorem [l] 

Proof of Theorem [2l 

We use the proof techniques of [21j and its extension^] 

Denote by Si = {ui,... Vi} the solution set of GP-Select after i iterations and by S* = {u*,. .. v*}, the solution set of the 
exact optimal solution after i iterations. 

Given that F(S) = (1 — A) f(v) + XD(S), the marginal gain of GP-Select in the (i + l) th step is given by: 

v£S 

Ai=F{SiU{v i+1 })-F{Si). 

Now, from Lemma [l] and submodularity, in each iteration, Ai can differ from the best greedy choice by at most the width 
of the confidence interval 


Ai > max l F(Si-iU {v}) — F(Si-i) — (1 - \)wi(vi) 

v£~V\{vi...Vi} I v_^_/ 

l £i-l 

where Wi(vt ) = 2 p] /2 Oi{vi). 

Since F is monotone, 

F(Si U S* B ) > F(S* b ) 

But also, by definition of S B , for all i = 0,..., B, 


F(Si U S£) < F(Si ) + B( A,+i + c<) = A i + s ( A »+i + 

i =1 


We can then get the following inequalities, 


F(S^) <B(A 1 +e 0 ) 


F(S£) < A 1 + B(A 2 + ei) 


B -1 

F(S^) <J 2 A i+ b ( A b + zb- 1 ) 

3 = 1 

Multiplying both sides of the i th inequality by (1 — ^) S_1 , and adding all the inequalities, we get 


/ 


B—l \ B 

J2 (1 - W F ( S *b ) < B J2 (A, + £i_ 1 ) = B 


\ 


F(S B ) -J2 ei 


\ 


Rb 




Further, we can simplify this to, 

F(S b ) - Rb > (l - (1 - 1/B) S ) F(S* b ) > (1 - 1 /e)F(S* B ) 

From Theorem flj we can bound R B = ti by 

\ZCiBf3 B Cn VB > 1, thus proving the claim of Theorem [ 2 ] 

Proof of Theorem [ 3 ] (For ease of presentation we use c, to denote c Vj when there is no confusion). Also, without loss of 
generality, we assume that Cmin > 1 Our proof is adapted from eh . We consider a modified version of greedy algorithm that 
is allowed to pick from only those elements whose individual costs are less than the budget B. Let ( Sj)j be the sequence of 
subsets chosen by this greedy algorithm. Si C S 2 C S 3 ... . Let l be the maximum index such that C(Si) < B. We will show 
that F(Si+ 1 ) is nearly optimal. And then, it is easy to see that F(Si) > F(Si+ 1 ) — max/(v) . In order to prove the theorem, 

we require the following lemma. 


Lemma 4. If F is submodular, S* € V is the optimal subset under budget B, and we run the modified greedy procedure 
picking elements {vi,V 2 ,... } in that order resulting in sets Si C S 2 C S 3 .... Then, 

F(S*) < F(Sj) + Bs j+ 1 + — e j+1 

_ c: :i ■ 1 

3 A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placments in gaussian processes: Theory, efficient algorithms 
and empirical studies. JMLR, 9:235-284, Feb 2008 







where Sj+i = and tj +1 is the error in estimating Sj. 

Proof: Let S* \ Sj = { 01 , 02 ,. • • o m } 

Then, 


F{S*) < F(Sj U S*) 


<F(S j )+J2^(o i | 5 ,-) 

i=l 


<F(Sj) + B 


F(S j+1 )-F(Sj)+e j+1 


C j +1 


B 


— F{Sj) + Bsj+i H- tj +1 

Cj + l 


where the second inequality is due to submodularity and the third inequality is due to the greedy selection rule. 

For running GP-Select, e^+i is instantaneous regret which is upper bounded by the width of the confidence interval, 

a S j - 1 {Vj) 

Now we are ready to prove Theorem [3] 

Consider Si+ 1 , the result of greedy algorithm that just becomes infeasible. Let A j = F(S*) — F(Sj) 


A j < Bsj+ 1 H - tj+ 1 ( From Lemma[4| 

Cj+l 


= B 


A,-A 


3 +1 


Cj'+l 


+ e 3 +1 


Rearranging the terms, we get, Aj+i < Aj (l — + Cj+ie,+i 

Using the fact that 1 — < 1 , anc j using the telescopic sum, we get, 

A !+1 < Ai (n 1 - ^ + EUte+rej+r) 

Note that the product series is maximised when Cj +1 = y. Thus, 


A;+i < Ai fl --] + y^(cj + iej+i) 


i=i 


- + yWi+o 


i=i 


< F( 5 ‘)I + ^ (c . +ie . +l) 


3 =1 


< F(5 )- + c ma:E ^ £j+i 
e z —' 

j=i 


< F(S*) -h CmaxRB 

e 


Thus, F{Si+ 1 ) > (1 - 1)^(5*) - and P(S' i ) > F(Si+ 1 ) - max/(u) 

uG V 







